fix(server): avoid extracting text range filters#3034
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a traversal query-planning bug where range predicates (gt/gte/lt/lte) on Text properties were being extracted/pushed down into backend queries, leading to incorrect behavior (and the exception reported in #2935). The change keeps these text range predicates in the traversal layer while adding a regression test to ensure direct traversals behave consistently with equivalent match() traversals.
Changes:
- Prevent extraction of
HasStepcontainers when they include range comparisons onTextproperties. - Keep existing extraction behavior for system properties and non-text property predicates (intended).
- Add a regression test reproducing the traversal shape from #2935 and asserting consistent results between direct and
match()forms.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java | Adds a guard to stop extracting HasContainers with text range predicates into backend query steps. |
| hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/core/CountStrategyCoreTest.java | Adds a regression test covering a text range filter followed by repeat(...).emit().times(1).count(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…fix/fix-text-range-filter # Conflicts: # hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3034 +/- ##
============================================
- Coverage 35.94% 31.36% -4.58%
- Complexity 338 500 +162
============================================
Files 803 814 +11
Lines 68053 69240 +1187
Branches 8907 9113 +206
============================================
- Hits 24460 21720 -2740
- Misses 40968 45118 +4150
+ Partials 2625 2402 -223 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
imbajin
left a comment
There was a problem hiding this comment.
The original correctness fix is clear, but the latest version now broadens the PR into a more aggressive HasStep split-and-remove optimization. I left one scope/risk comment suggesting we keep this PR on the minimal safe path and move the broader optimization to a follow-up.
| if (!GraphStep.processHasContainerIds(newStep, has)) { | ||
| newStep.addHasContainer(has); | ||
| } | ||
| holder.removeHasContainer(has); |
There was a problem hiding this comment.
has("vp4", P.lt("")) should not be pushed into the backend query path. The latest implementation now goes further and splits a mixed HasStep, pushes selected containers down, removes them from the original HasStep, and adds new label/index-availability logic in TraversalUtil.
Original bug boundary
Text range predicate
|
v
wrongly pushed to backend
|
v
exception / mismatch with match()
Latest implementation boundary
HasStep(vp4 < "", age = 1)
|
+-- push age = 1 into HugeGraphStep / HugeVertexStep
|
+-- remove age = 1 from the original HasStep
|
+-- rely on new label/index checks to prove backend can enforce it
That is a much broader optimizer change than the original fix, and once a container is removed from the original HasStep, correctness depends on canExtractHasContainer() exactly matching what the backend planner can really execute. The new tests cover several cases, but this still expands the review surface into composite-index matching, label context, paging/sysprop interactions, and future index-planner behavior.
For this PR, I would prefer one of these smaller and safer shapes:
Option 1: minimal correctness fix
- keep unsafe Text + range predicates in the traversal layer
- do not split mixed HasSteps
- avoid adding backend index-availability logic here
Option 2: low-risk partial prefiltering
- optionally push safe containers down as redundant backend pre-filters
- keep the original HasStep unchanged unless all containers are extractable
- correctness remains guarded by the full traversal-layer filter
Safer partial prefiltering shape
Backend prefilter:
age = 1
Traversal filter still intact:
vp4 < "" AND age = 1
Result:
backend may reduce candidates, but traversal still owns correctness
The current split-and-remove optimization can still be valuable, but I think it should be a separate follow-up PR with dedicated tests for composite-index prefix matching, label-less queries, ~page / paging semantics when some user filters remain in traversal, and parity with the backend index planner. Keeping this PR minimal would make the #2935 fix easier to reason about and safer to merge.
There was a problem hiding this comment.
Okay, sorry. I was too obsessed with fixing the review issue of copilot earlier, which led to a slight deviation in direction. I'll adjust my focus now.
There was a problem hiding this comment.
Thanks for adjusting the direction. To clarify my previous comment: I do not mean that we should choose freely between the two shapes. For this PR, I would prefer the minimal option A.
The original issue is small and focused: Text range predicates should not be extracted into the backend query path. So the safest fix here is to add the narrow condition that blocks only that unsafe case, while keeping the existing extraction behavior unchanged for the rest.
Preferred shape for this PR
HasStep(vp4 < "", age = 1)
|
v
if any extracted container is Text + range:
keep this HasStep in traversal layer
avoid backend text-range extraction
No new index-planner logic
No mixed HasStep split/remove behavior
No broader optimizer contract change
This may leave a small performance loss for the uncommon case where one has() step mixes a text-range predicate with other safe predicates, but it keeps the patch proportional to the bug and avoids introducing new correctness risks around label context, index matching, composite index prefixes, paging, or unique/search/numeric index behavior.
PS: the more aggressive split-and-remove optimization can still be considered later
There was a problem hiding this comment.
Thanks for the clarification. I agree with option A for this PR.
I have scoped the local changes back to the minimal correctness fix:
- no mixed
HasStepsplit/remove behavior - no new index-planner or label/index availability logic
- if a
HasStepcontains aTextrange predicate (gt/gte/lt/lte), keep the wholeHasStepin the traversal layer - keep the existing extraction behavior for other cases as much as possible
This intentionally accepts the small potential performance loss for mixed HasSteps, so the fix stays proportional to #2935. I’ll push the narrowed patch after one final local check.
There was a problem hiding this comment.
Now this fix adopts a conservative strategy. When encountering the same HasStep mixed with text range and other security conditions, the entire HasStep will be retained, possibly with less backend push-down optimization
Purpose of the PR
This PR fixes the query-planning path exposed by #2935.
For top-level traversals like
g.V().has("vp4", P.lt("")).repeat(...).count(),HugeGraph may extract the text range predicate into a backend query. However,
range predicates on text properties should not be planned as backend range
index queries. This can make the direct traversal fail while an equivalent
match()traversal returns normally.This change keeps such text range predicates in the traversal layer instead of
pushing them down to the backend query, making the direct traversal consistent
with the equivalent
match()form.Main Changes
Textproperty predicates withgt/gte/lt/lteintoHugeGraphStep/HugeVertexStepbackend queries.predicates.
has("vp4", P.lt("")).repeat(__.out("el2")).emit().times(1).count().match()traversal.Verifying these changes
-
mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -Dtest=CountStrategyCoreTest#testRepeatAfterTextRangeFilterWithEmptyResult -DfailIfNoTests=false "-Drat.skip=true" "-Dcheckstyle.skip=true" testmvn -pl hugegraph-server/hugegraph-test -am -P core-test,rocksdb -Dtest=CountStrategyCoreTest#testRepeatAfterTextRangeFilterWithEmptyResult -DfailIfNoTests=false "-Drat.skip=true" "-Dcheckstyle.skip=true" testDoes this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need